Methods and systems for determining image processing operations relevant to particular imagery

ABSTRACT

Image data, such as from a mobile phone camera, is analyzed to determine a colorfulness metric (e.g., saturation) or a contrast metric (e.g., Weber contrast). This metric is then used in deciding which of, or in which order, plural different image recognition processes should be invoked in order to present responsive information to a user. A great number of other features and arrangements are also detailed.

RELATED APPLICATION DATA

This application is a continuation of application Ser. No. 12/821,974, filed Jun. 23, 2010 (now U.S. Pat. No. 8,660,355), which claims priority to provisional application 61/315,475, filed Mar. 19, 2010 (which is attached as an Appendix, and forms part of this specification).

TECHNICAL FIELD

The present technology relates to image processing, and more particularly relates to selection of an image processing operation(s) that may be appropriate to a particular set of image data.

BACKGROUND AND SUMMARY

U.S. Pat. No. 6,405,925 and application 20070278306 (both to Symbol Technologies) detail imager-based barcode readers (as opposed to laser-based). These references particularly concern methods for identifying barcodes—and their specific types—in the context of other imagery. In an exemplary arrangement, contrast statistics and directional vectors associated with detected edge lines are used to identify what sub-region(s), if any, of the image data likely corresponds to a barcode. A barcode decoder then processes any thus-identified image sub-region(s) to extract a payload.

Since these references concern dedicated barcode readers, they are not designed for more general purpose image processing. In more general arrangements, consideration may be given to barcodes that might not be characterized by high contrast edges (e.g., barcodes that are in “soft” focus), and other image scenes that might present high contrast linear edges, yet are not barcodes (e.g., a white picket fence against a blue sky background).

Google, in its U.S. Pat. No. 7,565,139, teaches a system that processes input imagery by applying multiple recognition processes, e.g., optical character recognition (OCR), object recognition, and facial recognition. Each process produces a confidence score with its results. If the facial recognition confidence score is higher than the other scores, then the image is presumed to be a face, and those results are used for further processing. If the OCR score is the highest, the image is presumed to depict text, and is treated on that basis. Etc.

It will be recognized that this is a brute force approach—trying all possible recognition processes in order to get a useful result. Indeed, the processing is performed by a remote server, since timely execution of the various involved algorithms is apparently beyond the capabilities of mobile platforms.

Pixto (since acquired by Nokia) teaches a more sophisticated approach to mobile visual query in its application 20080267504. In the Pixto arrangement, a mobile handset obtains GPS information to determine the geographical context in which imagery is captured. If the handset is found to be in a shopping mall, a barcode recognition process is preferentially applied to captured image data. If the handset is found to be outdoors, an object recognition process may be most appropriate. (The phone may load an object glossary emphasizing local points of interest, e.g., the Statue of Liberty in New York Harbor.) A set of rules, based on location context, is thus applied to determine what image recognition processing should be performed. (Pixto also teaches looking for stripes in imagery to indicate barcodes, and looking for regions of high spatial frequency content as possibly indicating text.)

In accordance with certain embodiments of the present technology, drawbacks associated with the foregoing approaches are overcome, and new features are provided.

In one particular embodiment, color saturation of input image data is used as a metric to discriminate whether a first set of image recognition processes (e.g., object or facial recognition) is more likely to be relevant than a second set of image recognition processes (e.g., OCR or barcode reading). Such classification technique can be used in conjunction with other known arrangements, including those taught in the references noted above, to improve their performance and usefulness.

The foregoing and other features and advantages of the present technology will be more apparent from the following detailed description, which proceeds with reference to the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

(FIGS. 1-24 are reproduced from priority application 61/315,475.)

FIG. 1 shows an embodiment employing certain aspects of the present technology, in an architectural view.

FIG. 2 is a diagram illustrating involvement of a local device with cloud processes.

FIG. 3 maps features of a cognitive process, with different aspects of functionality—in terms of system modules and data structures.

FIG. 4 illustrates different levels of spatial organization and understanding.

FIGS. 5, 5A and 6 show data structures that can be used in making composition of services decisions.

FIGS. 7 and 8 show aspects of planning models known from artificial intelligence, and employed in certain embodiments of the present technology.

FIG. 9 identifies four levels of concurrent processing that may be performed by the operating system.

FIG. 10 further details these four levels of processing for an illustrative implementation.

FIG. 11 shows certain aspects involved in discerning user intent.

FIG. 12 depicts a cyclical processing arrangement that can be used in certain implementations.

FIG. 13 is another view of the FIG. 12 arrangement.

FIG. 14 is a conceptual view depicting certain aspects of system operation.

FIGS. 15 and 16 illustrate data relating to recognition agents and resource tracking, respectively.

FIG. 17 shows a graphical target, which can be used to aid machine understanding of a viewing space.

FIG. 18 shows aspects of an audio-based implementation.

FIGS. 19 and 19A show a variety of possible user interface features.

FIG. 19B shows the lower pane, map portion, of FIG. 19 in greater detail.

FIGS. 20A and 20B illustrate a method of object segmentation using thresholded blobs.

FIGS. 21A, 21B and 22 show other exemplary user interface features.

FIGS. 23A and 23B show a radar feature in a user interface.

FIG. 24 serves to detail other user interface techniques.

FIGS. 25-27 detail methods according to the present technology.

DETAILED DESCRIPTION

(The Appendix details illustrative embodiments and methods in which the presently-described technology can be utilized, and provides further information about exemplary implementations.)

In accordance with certain embodiments of the present technology, captured imagery is examined for colorfulness (e.g., color saturation). This may be done by converting red/green/blue signals from the camera into another representation in which color is represented separately from luminance (e.g., CIELAB). In this latter representation, the imagery can be examined to determine whether all—or a significant spatial area (e.g., more than 20%, 50%, 90%, etc.)—of the image frame is notably low in color (e.g., saturation less than 50%, 15%, 5%, etc.). If this condition is met, then the system can infer that it is likely looking at printed material, such as barcode or text, and can activate recognition agents tailored to such materials (e.g., barcode decoders, optical character recognition processes, etc). Similarly, this low-color circumstance can signal that the device need not apply certain other recognition techniques, e.g., facial recognition and watermark decoding.

Contrast is another image metric that can be applied similarly (e.g., printed text and barcodes are usually high contrast). In this case, a contrast measurement (e.g., RMS contrast, Weber contrast, etc.) in excess of a threshold value can trigger activation of barcode-and text-related agents, and can bias other recognition agents (e.g., facial recognition and watermark decoding) towards not activating.

Conversely, if captured imagery is high in color or low in contrast, this can bias barcode and OCR agents not to activate, and can instead bias facial recognition and watermark decoding agents towards activating.

Thus, gross image metrics can be useful discriminants, or filters, in helping decide what different types of processing should be applied to captured imagery.

In other embodiments, other metrics can of course be used—such as the high frequency content test of Pixto, or the linear edges used by Symbol. The absence of high frequency content and linear edges, for example, can elevate the execution priority of a facial recognition algorithm over alternatives such as OCR and barcode decoding.

Likewise, some embodiments can employ other context data in deciding what recognition process to employ. Location, as taught by Pixto is one, but there are many others.

Some devices capture a stream of images, e.g., to show a moving real-time camera image on the display of a mobile phone. The image metric(s) may be computed based on one frame of image data, and the recognition process determined with reference to that metric can be applied to one or more subsequent frames of image data.

In some implementations, the calculated metric is not absolutely determinative of the recognition process that should be used. Instead, it is used as one factor among many in deciding what process to apply, or in which order plural candidate processes may be successively applied until a positive recognition result is achieved. A rule based approach can be employed, in which several inputs are checked with compliance with different conditions, to determine the appropriate action. For example, if color saturation is below a reference value S1, and high frequency content is above a reference value HF1, then apply an OCR process (or apply it first and a barcode process second). If color saturation is below S1, and high frequency content is below HF1, then apply a barcode process first. If color saturation is above S1, and high frequency content is above HF1, then apply object recognition first. And if color saturation is above S1, and high frequency content is below HF2, then apply facial recognition first.

The foregoing example is naturally simplified. In typical implementations, more complex rules may be used, involving a variety of different reference values or other conditions.

FIGS. 25-27 show aspects of various of the foregoing methods.

As suggested above, the computed metric can also serve as a biasing factor, helping tip a decision that may be based on other factors in one direction or another.

It will be understood that a mobile phone processor, operating in accordance with software instructions stored in memory, can perform all the acts required by the present technology. Or some/all acts can be performed by dedicated hardware in the mobile phone, or by processors/hardware at remote locations.

The specification provided in the Appendix details further technology that can be used in conjunction with the above-described arrangements.

To provide a comprehensive disclosure without unduly lengthening this specification, applicant incorporates by reference—in their entireties—the documents referenced herein. 

1. A method comprising the acts: analyzing received image data to determine a colorfulness metric, a color saturation metric, or a contrast metric; and using said determined metric in deciding which plurality of different image recognition processes should be applied to image data captured by a smartphone camera, in order to present information derived from different types of imagery to a user from the smartphone.
 2. The method of claim 1 that includes applying an image recognition process in accordance with said deciding act.
 3. The method of claim 1 that that includes applying at least two of a barcode reading function, an optical character recognition function, a facial recognition function, and/or a watermark decoding function as a consequence of said deciding.
 4. The method of claim 1 that that includes applying a barcode reading function as a consequence of said deciding.
 5. The method of claim 1 that that includes applying an optical character recognition function as a consequence of said deciding.
 6. The method of claim 1 that that includes applying a facial recognition function as a consequence of said deciding.
 7. The method of claim 1 that that includes applying a watermark decoding function as a consequence of said deciding.
 8. The method of claim 1 that includes receiving the image data from a camera system of a mobile phone device.
 9. The method of claim 1 that further includes using said determined metric in deciding which of said plural image recognition processes not to invoke.
 10. A mobile phone including a processor and a memory, the memory containing non-transitory software instructions causing the processor to perform the method of claim
 1. 11. The method of claim 1 in which said using act includes: comparing the determined metric with a threshold value; if the determined metric is below the threshold value, applying one or more image recognition processes from a first set of processes; and if the determined metric is above the threshold value, applying one or more image recognition processes from a second set of processes different than the first.
 12. The method of claim 1 in which said using act includes: applying the determined metric as an input to a rule-based process to determine which plurality of different recognition processes should be applied; and applying the determined recognition processes to a set of image data.
 13. The method of claim 1 that includes analyzing the received image data to determine a colorfulness metric.
 14. The method of claim 1 that includes analyzing the received image data to determine a color saturation metric.
 15. The method of claim 1 that includes analyzing the received image data to determine a contrast metric.
 16. The method of claim 1 that includes analyzing the received image data to determine a color saturation metric, said analyzing including converting red/green/blue signals into another representation in which color is represented separately than luminance.
 17. The method of claim 16 that includes determining whether more than 50% of the image data has a color saturation of less than 50%, in deciding which plurality of, or in which order plural, different image recognition processes should be applied to the image data captured by the smartphone camera.
 18. The method of claim 16 that includes determining whether more than 50% of the image data has a color saturation of less than 50%, in deciding whether to apply a facial recognition operation to the image data.
 19. A non-transitory computer readable storage medium having stored therein software instructions, said instructions being operative to cause a mobile phone processor programmed thereby to: analyze received image data to determine a colorfulness metric, a color saturation metric, or a contrast metric; and use said determined metric in deciding which plurality of different image recognition processes should be invoked by the mobile phone.
 20. A method comprising: analyzing received image data to determine a colorfulness metric, a color saturation metric, or a contrast metric; comparing the determined metric with a threshold value; if the determined metric is below the threshold value, applying one or more recognition processes from a first set of processes; and if the determined metric is above the threshold value, applying one or more recognition processes from a second set of processes different than the first; wherein the determined metric is used in deciding which plurality of different image recognition processes should be applied to image data captured by a smartphone camera, in order to present information derived from different types of imagery to a user from the smartphone.
 21. The method of claim 20 in which one of said sets includes a barcode reading process, and the other of said sets includes a facial recognition process.
 22. The method of claim 20 in which one of said sets includes a barcode reading process, and the other of said sets includes an object recognition process.
 23. The method of claim 20 in which one of said sets includes an OCR process, and the other of said sets includes a facial recognition process.
 24. The method of claim 20 in which one of said sets includes an OCR process, and the other of said sets includes an object recognition process.
 25. The method of claim 20 that includes analyzing the received image data to determine a colorfulness metric.
 26. The method of claim 20 that includes analyzing the received image data to determine a color saturation metric.
 27. The method of claim 20 that includes analyzing the received image data to determine a contrast metric.
 28. A method comprising: analyzing a first set of image data to compute a colorfulness metric, a color saturation metric, or a contrast metric; applying said computed metric as an input to a rule-based process to determine which of, or in which order, plural different recognition processes should be applied; and applying the determined recognition process(es) to a set of image data; wherein the computed metric is used in deciding which plurality of different image recognition processes should be applied to image data captured by a smartphone camera, in order to present information derived from different types of imagery to a user from the smartphone.
 29. The method of claim 28 that includes applying the determined recognition processes to the first set of image data.
 30. The method of claim 28 that includes applying the determined recognition processes to a second set of image data different than the first set of image data.
 31. The method of claim 28 that includes analyzing the received image data to compute a colorfulness metric.
 32. The method of claim 28 that includes analyzing the received image data to compute a color saturation metric.
 33. The method of claim 28 that includes analyzing the received image data to compute a contrast metric. 